Fast Exact String Matching on the GPU
نویسنده
چکیده
We present a string-matching program that runs on the GPU. Our program, Cmatch, achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version. String matching has a long history in computational biology with roots in finding similar proteins and gene sequences in a database of known sequences. The explosion in sequence data available in the 80s and 90s motivated the development of ever faster techniques for searching for similar sequences, and ultimately lead the use of parallelized execution of string matching algorithms using sophisticated data structures called suffix trees. Suffix trees can be constructed time proportional to the length of the corpus, and provide exact matching of a query in time proporional to the length of the query, independent of the size of the corpus. Here, we present our string-matching kernel for use in the Compute Unified Device Architecture, which executes parallelized searching of a suffix tree for finding exact matches for a set of query strings. We compare our GPGPU suffix tree search to a serial CPU version of the algorithm, and analogous components of the widely used CPU program MUMmer, and explore issues associated with storing a suffix tree in a graphics card’s memory, and data distribution among the GPU’s processing units.
منابع مشابه
Faster Multiple Pattern Matching System on GPU based on Bit-Parallelism
In this paper, we propose fast string matching system using GPU for large scale string matching. The key of our proposed system is the use of bit-parallel pattern matching approach for compact NFA representation and fast simulation of NFA transition on GPU. In the experiments, we show the usefulness of our proposed pattern matching system.
متن کاملFast exact string matching algorithms
String matching is the problem of finding all the occurrences of a pattern in a text. We propose a very fast new family of string matching algorithms based on hashing q-grams. The new algorithms are the fastest on many cases, in particular, on small size alphabets. © 2007 Elsevier B.V. All rights reserved.
متن کاملUltra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملEvaluation and Improvement of Fast Algorithms for Exact Matching on Genome Sequences
With the availability of large amounts of dna data, exact matching of nucleotide sequences has become an important application in modern computational biology and in meta-genomics. In the last decade several efficient solutions for the exact string matching problem have been developed and most of them are very fast in practical cases. However when the length of the pattern is short or the alpha...
متن کاملTo Use or Not to Use: Graphics Processing Units for Pattern Matching Algorithms
String matching is an important part in today’s computer applications and Aho-Corasick algorithm is one of the main string matching algorithms used to accomplish this. This paper discusses that when can the GPUs be used for string matching applications using the Aho-Corasick algorithm as a benchmark. We have to identify the best unit to run our string matching algorithm according to the perform...
متن کامل